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Abstract. Many of the chemical reactions that take place within a living 
cell are irreversible. Due to evolutionary pressures, the number of allowable 
reactions within these systems are highly constrained and thus the resulting 
metabolic networks display considerable asymmetry. In this paper, we explore 
possible evolutionary factors pertaining to the reduced symmetry observed in 
these networks, and demonstrate the important role environmental variability 
plays in shaping their structural organization. Interpreting the returnability 
index as an equilibrium constant for a reaction network in equilibrium with 
a hypothetical reference system, enables us to quantify the extent to which 
a metabolic network is in disequilibrium. Further, by introducing a new di- 
rected centrality measure via an extension of the subgraph centrality metric 
to directed networks, we are able to characterise individual metabolites by 
their participation within metabolic pathways. To demonstrate these ideas, 
we study 116 metabolic networks of bacteria. In particular, we find that the 
equilibrium constant for the metabolic networks decreases significantly in-line 
with variability in bacterial habitats, supporting the view that environmental 
variability promotes disequilibrium within these biochemical reaction systems. 



1. Introduction 

The set of biochemical metabolic reactions within a living cell form a network 



of chemical transformations known as a metabolic network (Buchanan et al. 2010 



Estrada 2011). Metabolic networks share many interesting features with other 
reaction networks. For instance, the products of a metabolic reaction act as reac- 
tants for other reactions, thus forming a complex network of metabolic reactions. 
The metabolites, i.e. those chemicals involved in metabolic reactions, are small 
molecules which are imported/exported and/or synthesised/degraded inside the 
organism. They can play the role of substrates (reactants) or products. As with 
many other chemical reactions, in metabolic ones there exists another important 
contributor, playing the role of catalyst: the enzymes. Last but not least are 
the coj "actors, which are small molecules that bind enzymes and can enhance or 
decrease their catalytic activities. This combination of substrates, products, en- 
zymes and cofactors make metabolic reactions more complex than usual chemical 
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reactions. Representing such a vast complexity of metabolic transformations is 
a very challenging problem. Typically, a reductionist approach is taken whereby 
one represents the network of substrates and products linked by their direct trans- 
formations into a condensed metabolic network (Lacroix et al. 2008). However, 
such oversimplification can lead to a number of problems when information con- 
cerning specific metabolic pathways or the individual role of metabolites is to be 
inferred (Arita 2004). A global analysis of metabolic networks is less prone to 
such kinds of errors. For instance, Parter et al. (2007) found that certain global 
characteristics of metabolic networks, e.g. modularity, correlated with the level 
of variability within an organisms environment for some 117 bacterial species. 
That is, they found that the more variable the environment, the more modular 
the metabolic network. This study, however, failed to account for an extremely 
important characteristic of metabolic and other reaction networks. That is, un- 
der determined physiological conditions some reactions are irreversible, i.e. they 
take place in one of only two possible directions. The reversibility of a metabolic 
reaction is controlled by the thermodynamics, the kinetics and the stoichiometry 
of the reaction. 

Here we analyze metabolic networks for 116 of the bacterial species previously 



studied by Parter et al. (2007). These species live in a broad range of habitats 



such as oceans, salt lakes, thermal vents, soil and within hosts. Using a classifica- 
tion of these species according to variability in their environments we investigate 
the extent to which the reversibility of an organisms metabolic network impacts 
on its adaptability. That is, we consider the directed network of metabolic re- 
actions in which the directionality of a given reaction is indicated by an arrow 
from substrate to product. Our method consists of formulating a statistical me- 
chanics approach to derive a hypothetical equilibrium constant between the real 
metabolic network and one in which all reactions are reversible. The equilib- 
rium constant measures how far from a globally reversible state, i.e one in which 
all reactions are reversible, a metabolic network is. Calculating the free energy 
of the system with respect to this hypothetical equilibrium provides a measure 
of the 'energetic cost' of the evolutionary process giving rise to the correspond- 
ing metabolic network. Using this approach we conclude that bacteria living in 
more variable environments display on average a higher 'global reversibility' (or 
returnability) in their reactions. In contrast, the metabolic networks of those 
living in very specialized media display significantly reduced levels of returnabil- 
ity. Maintaining the reversibility of all possible metabolic reactions comes at an 
extremely high cost. Thus, increased returnability in metabolic networks can be 
considered as an extreme evolutionary strategy due to pressures imposed by the 
potential lack of certain nutrients within highly variable environments. On the 
other hand, when the media is specialised, a constant stream of nutrients is more 
readily available, allowing the returnability of the metabolic network to be kept 
at a minimum thus saving considerable energy during the evolutionary process. 



3 





(a) Physical system 



(b) Reference system 



Figure 1. Reaction network (left) and its reference system (right) 
consisting of the same set of nodes but with every link being undi- 
rected. 



2. Background 

We begin by reformulating the concept of network returnability, as introduced 
by Estrada & Hatano (2009), in terms of the equilibrium of a directed reaction 



graph and its hypothetical reference system. Thus providing a quantitative mea- 
sure for investigating the extent to which disequilibrium within chemical reaction 
systems may be considered a product of their natural environments. To start, let 
us consider that for each reaction network there exists a hypothetical reference 
system in which all reactions are reversible. In the language of graph theory, 
the reference system of a set of chemical reactions is nothing other than the un- 
derlying, undirected graph for the corresponding reaction graph (see Figure [T]). 
The reference system considers all reactions to be reversible and thus may be 
considered as an ideal equilibrium. Note, that a network in which all links are 
reversible is represented as an undirected graph. 

In order to determine how far from this ideal equilibrium a real chemical reac- 
tion network lies, we consider the difference in free energy between the physical 
and reference systems. Let F p and F r denote the free energies of the reaction and 
reference networks respectively. Then 



(1) 



AF 



where ^{ Pj r} is the partition function of the corresponding (physical or reference) 
system and (3 = 1/kT denotes the inverse temperature^ Here temperature may 
be considered more of a metaphor than a 'real' physical parameter, and quanti- 
fies the extent to which the network under consideration is subject to external 



k is the Boltzmann constant and T the temperature. 
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stresses. For instance, it could represent physiological or extreme environmental 
conditions under which the metabolic network functions. 

Note that the system consisting of only reversible reactions, i.e. the reference 
system, can be considered as being more stable than the physical system for which 
only a fraction of reactions are reversible. Now, by considering the ratio of the 
two partition functions in ([T|, we are able to determine the relative returnability 
(Estrada & Hatano 2009[ ) of a reaction network with respect to its hypothetical 
system 

Z{G) 

where here, we have used the fact that the chemical reaction system can be 
represented by the digraph D = (V, Ed) and the corresponding reference system 
by its underlying graph G = (V, Eg)- 

Recently, Estrada & Hatano (2007) showed that the partition function of an 
undirected graph is given by 

(2) Z(A) = trace (e? A ) , 

where A is the adjacency matrix of the graph. The quantity in (|2| defines the 
so-called Estrada index of a network (De la Peiia et al. 2007). Expanding ^ in 
a Taylor series gives 

B 2 B k 
Z(A) = trace(J) + /3trace(A) + ^trace(A 2 ) H h ^-trace(yr) H . 

Here / denotes the identity matrix. It is well known that the trace {A k ) counts 
the number of closed (self-returning) walks of length k within a graph. Thus, 
trace(A fc ) > for any network containing at least one cycle of length k. The 
partition function Z(G) can then be considered as a weighted sum over all closed 
walks (CWs) within the undirected graph - longer CWs being more heavily pe- 
nalized by the factor l/k\. 

Similarly, one may define the partition function of a directed network as 

6 2 

(3) Z(D) = trace(e /3D ) = trace(J) + fitrace(D) + ^-tiace(D 2 ) + ■■■ 



2! 



+ ^trace( J D fe ) + 



Here, tia.ce(D k ) counts the number of returnable walks of length k starting and 
ending at the same node. Again, trace(Z) fc ) > if and only if the network has 
a returnable cycle of length k. Now, because trace(J) is equal to the number of 
vertices in the graph, and since we are are not interested in the influence network 
size has on the relative 'returnability' of a reaction network, we simply remove 
this term from the partition functions. Hence, we obtain the following augmented 
ratio 

KM = Z(G,P)-n 
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We call the quantity K r (/3) the returnability of a directed network. 

The returnability of a directed network is bounded as < K r ((3) < 1. The 
lower bound being attained for acyclic digraphs, whilst the upper bound is at- 



tained by symmetric digraphs. For the directed cycle on n nodes C n , 
that K. 



we have 



-> as n -> oo ( [Estrada fc Hatano||2009[ ). Note that for f3 = the 
returnability is undefined since Z(D,{3) = Z(G,/3) = 0. This represents the situ- 
ation in which the network is at infinite temperature with every node in isolation 
and thus we may set K r = in this case. Note, that we define the thermodynamic 
equilibrium constant of a network as 



(4) 



pK r = — \ogK r . 



A large value for this index corresponds to a network that lies far from the ideal 
equilibrium in which all reactions are reversible, or, in other words, larger values 
of pK r are associated with less returnable networks. 

For completeness, we note that the differences between the returnability metric 



as defined here and the network reciprocity ( Garlaschelli & Loffredo 2004) are 



substantial. The latter being the fraction of links in a network that are reciprocal 
(bidirected). For example, a directed cycle clearly has zero reciprocity yet it is 
straightforward to realize that the returnability is non-zero in this case. On 
the other hand, the existence of reciprocal links does not guaranty a high level 
of returnability within the network. The interested reader is directed to the 



reference (Estrada & Hatano 2009) for further details. 



We define the returnability of a given chemical (substrate or product) in a 
reaction graph analogously to that of the global returnability. That is, given a 
chemical % we define 

ieP D ) u - 1 



(5) 



K T {P, 



to be the returnability of all chemical reactions in which this chemical takes place. 
In this way, we may consider Equation ^ to be a new, directed centrality mea- 
sure given by the ratio of subgraph centralities (Estrada & Rodriguez- Velaquez 



2005) for the directed and underlying networks respectively. In the current con- 



text, we may interpret this as follows: metabolites that are involved in a larger 
number of pathways, and particularly shorter, more energy efficient ones, should 
be considered as being more important to the metabolic process. 



3. Metabolic Networks 

To illustrate our approach, we begin by investigating whether or not a corre- 
lation exists between environmental variability and the returnability-equilibrium 
constant as defined above. To do this, we analyse metabolic networks for some 
116 bacterial species, each of which can be categorised according to their natu- 
ral environment (see Table [l] for details). The organisms abide in a variety of 
habitats, ranging from highly specialised environments, with little, if any, contact 
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with the outside world - symbiotic bacteria such as buchnera for example - to 
those living in extremely heterogeneous media such as soil, and as such, have 
evolved under very different selective pressures^ 
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Obligate (35) 


21 


89 


280 


23 


102 


322 


Specialised (5) 


206 


235 


299 


222 


258 


336 


Aquatic (4) 


224 


264 


386 


251 


290 


440 


Facultative (41) 


40 


333 


473 


42 


385 


574 


Multiple (28) 


193 


454 


360 


214 


431 


553 


Terrestrial (3) 


282 


385 


421 


329 


463 


498 


Total (116) 


21 


282 


473 


23 


322 


574 



Table 1. Network statistics for the reaction graphs of the bacte- 
rial species studied in this work grouped according to the NCBI 
classification scheme. According to the NCBI, obligate bacteria 
have the most constant environment, followed by specialised and 
aquatic bacteria, and then facultative, multiple and terrestrial bac- 
teria in that order (see (Parter et al. 2007) and references therein 
for further details). 



These metabolic networks are the same as those studied in ( Parter et al.||2007 ) 
which were constructed using the KEGG database QKanehisa fe Goto|2000[ ) . Note 
that, there are a number of ways of representing the set of metabolic reactions 



of an organism as a simple graph (Holme 2009), the most popular of which is a 



substrate-product graph whereby each node corresponds to a metabolite, and a 
connection is made between reacting substances (substrates) and the products 
of the reaction. One potential caveat of such an approach, is that it can lead 
to the erroneous identification of metabolic pathways. Consider, for example, 
the hypothetical reaction given in Figure [2^,. From the corresponding graphical 
representation (Figure |2}d) one may identify a path of length 1 leading from ATP 
to Orthophosphate; however, any chemist would immediately object, noting that 
such a reaction is not possible as ATP must react with H 2 in order to produce 
Orphophosphate . 

Much of the criticism of network based studies of metabolism to date are 



the result of such misrepresentations of the underlying biology (Bourguignon 



et al. 2008, Montanez et al. 2010). We note that a correct treatment that avoids 



such issues is provided via directed hypergraphs (Klamt et al. 2009, Zhou & 



2 The taxonomy used to rank environmental variability is based upon the NCBI classification 


for bacterial lifestyle (see 


Parter et al. 


(2007 


) and references therein for further details). 



(ATP) + (H 2 Q) — > (Orthophosphate) + [ADP] 
(a) Hypothetical reaction. 
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(b) Corresponding substrate-product graph. 

Figure 2. Illustration of a hypothetical reaction and the corre- 
sponding representation as a substrate-product graph. 



Nakhleh 2011). However, since we are not performing a path analysis and in 



order to allow for ease of comparison with related studies (Parter et al. 2007 



Takemoto & Borjigin 2011, Zhou & Nakhleh 2012), we consider the substrate- 



product representation of metabolism in all of our experiments. Note also, that 
we remove so-called currency metabolites, e.g., ATP, NADH and H2O, as they 
tend not to be involved in higher order functions. Additionally, we simplify 
our analysis by considering only the giant connected component of each reaction 
graph. 

We calculate the mean thermodynamic index for 116 of the metabolic networks 



studied by Parter et al. (2007). In all calculations the inverse temperature was 
fixed to /3 = 0.25. Ideally, should be calibrated for each metabolic network ac- 
cording to individual environmental pressures. For example, metabolic networks 
functioning in extreme conditions should exhibit a value of /3 ~ (high temper- 
ature) reflecting the increased environmental stresses endured by such systems. 
However, as the exact conditions for fitting the parameter /3 are unknown, we 
assume an average value for all environments, selected as that value for which 
the different bacterial habitats are best discriminated. Figure [3|a) shows a plot 
of the mean thermodynamic index (pK r ) versus environmental variability for the 
different bacterial species. As can be clearly seen, the highest (pK r ) value was 
obtained for those bacteria in the obligate class, then came the specialised and 
aquatic bacteria, followed by a steady decrease for faculatitive, multiple and ter- 
restrial classes. The group differences observed in Figure [3]^a) were found to be 
significant (P-value: p < 10~ 5 ) according to the Kruskal-Wallis (KW) test. 

Additionally, we found that networks from those species residing in more con- 
stant environments contained significantly (P-value: p < 10~ 5 using the KW 
test) more non-returnable nodes, i.e. nodes such that K r (f3,i) = 0. Figure |3^b) 
shows the fraction of non-returnable metabolites versus environmental variability. 
Note that we can distinguish two subgroups of particular significance within the 
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Environmental Variability 



Figure 3. Relation between environmental variability and (a) the 
mean thermodynamic equilibrium constant (pK r (/3)) (f3 = 0.25), 
(b) the mean number of non-returnable metabolites, i.e. those 
metabolites with zero local returnability K r (0.25, i). In both cases, 
the six bacterial habitats under investigation are ordered along the 
x-axis in accordance with the predicted levels of variability pro- 
vided by Parter et al. (2007): Obligate, Specialised, AQuatic, 
Facultative, Multiple and Terrestrial. Error bars represent stan- 
dard errors. 



set of non-returnable metabolites of a reaction graph: (i) the so-called external 
metabolites, i.e. substrates that are not produced internally by the reaction net- 
work; and (ii) those product metabolites that are not fed back into the system. 
Such metabolites are often referred to respectively as sources and sinks. Clearly, 
we would expect to see a reduction both in the number of external metabolites 
and the number of products not being consumed via feedback mechanisms as 
environmental variability increases. 

We remark that this idea relates to previous studies that have found a corre- 



lation between genome length and variations in bacterial lifestyle (Parter et al. 



2007, Xu 2006), in that, constant environments that provide a steady supply of 



external metabolites lead to redundancy amongst certain genes, which, in turn, 
leads to genome reduction during the evolutionary process. Further evidence sup- 
porting this view was recently provided by Zhou & Nakhleh (2012), who found 
that microbes inhabiting varied, heterogeneous environments displayed a larger 




Figure 4. Visualisation of the metabolic network for the bac- 
terium Buchnera aphidicola. Nodes are proportional to the local 
returnability metric given in ^ with (3 = 0.25. Source and sink 
nodes have been highlighted using red triangles and blue squares 
respectively (colour on-line). 



metabolome than those leading a more specialised lifestyle. For example, |Parez- 



Brocal et al. (2006) have shown that the obligate symbiont Buchnera aphidi- 



cola displays a significantly reduced genome, some 200 kilobases, when compared 
against previously sequenced strains. 

Figure [4] provides an illustration of the above in the case of Buchnera. Sink 
and source nodes are highlighted using red triangles and blue squares respectively 
(colour on-line); whilst nodes with non-zero local returnability are scaled propor- 
tionately. The first interesting observation is the high number of non-returnable 
nodes, approximately 30% of all metabolites. However, perhaps more interesting 
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is the fact that some 65% of these metabolites are obtained from the environ- 
ment, via Buchnera- Aphid symbiosis in this case, thus highlighting the central 
role played by the host in maintaining functional integrity within these systems. 

Note, that in this work we follow the same environmental scheme as the one 
proposed by Parter et al. (2007). However, when considering our results for spe- 
cialized and aquatic bacteria we want to make a call for prudence concerning this 
particular classification scheme. Whilst it is true that some specialised bacteria 
live in more restricted environments than aquatic ones, Figure [3] deals with the 
average thermodynamic equilibrium constant across both specialised and aquatic 
bacteria. It can clearly be seen that specialized bacteria display a significantly 
larger variance in their values of pK r than aquatic ones. The reason for this 
apparent anomaly can be understood as follows. The specialized bacteria con- 
sidered in this study are all so-called thermophiles, i.e. A. aeolicus, C. tepidum, 
T. Tengcongensis, T. Elongatus and T. Maritima. However, these bacteria vary 
significantly with respect to several other environmental factors. For instance, 
A. aeolicus requires oxygen to survive whilst C. tepidum is an anaerobic pho- 
totrophic bacterium. Habitats for this last bacterium are quite variable including 
anoxic and sulfide-rich waters, mud, sediments, microbial mats, and even micro- 
bial consortia. The range of temperatures in which these bacteria live also varies 
significantly, ranging from an optimum growth temperature of 55 °C for T. Elon- 
gatus to water temperature of 80 °C for T. Maritima. On the other hand, the 
aquatic bacteria studied here ( C. crescentus, Prochlorococcus, Synechococcus and 
Synechocystis sp.) live in globally less variable environments. The main difference 
being that C. crescentus and Synechocystis sp. live in fresh water environments 
while the others live in marine ones. However, many species of Synechococcus 
have also been observed in freshwaters. In order to obtain a sense of the level of 
'specialization' of some aquatic bacteria we mention further the fact that Syne- 
chococcus are also found in oceanic areas which are nutrient depleted, such as the 
central gyres. Altogether, these facts support our results that indicate (i) a lack of 
any significant differences between pK r values for specialised and aquatic groups; 
and (ii) a significantly increased variability in the values of pK r for specialised 
bacteria. 

Next, we analyse the role of individual metabolites by considering the local 
returnability metric for the 116 metabolic networks studied above. The local 
returnabilty is given by Equation ([5]) and represents a new type of directed cen- 
trality measure. First, we analyse the extent to which the information contained 
in this new descriptor is reproduced by other well-known centrality measures. In 
Figure [5] we plot the Spearman rank correlation coefficients between the local 
returnability and the betweenness, eigenvector, in- and out-degree centralities. 
In all cases we have considered only directed versions of these network measures. 
As can be clearly seen, the correlations are low indicating that the information 
contained in the local returnability metric is not reproduced by any of the other 
centrality measures. In particular, the eigenvector centrality shows extremely 
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Figure 5. Spearman rank correlation between the local return- 
ability metric K r ((3,i), with /3 = 0.25, and the centrality measures 
of betweenness, eigenvector, in-degree and out-degree for 116 bac- 
terial metabolic networks. 

poor correlations, indicating that the local returnability provides a unique spec- 
tral measure of centrality. 

In Figure [6] we plot the local returnability profiles for illustrative bacterial net- 
works from the six environmental lifestyles. Only metabolites that occur across 
all six organisms (68 in total) are displayed; in all cases, metabolites are ordered 
according to the ranking provided by Buchnera. As can be seen there are a 
number of universal characteristics shared by these metabolites. Firstly, a small 
number of metabolites are found to be highly returnable across all six networks, 
e.g. L-tryptophan, indole and indoleglycerol phosphate. Furthermore, a signifi- 
cant overlap, approximately 60%, was found between non-returnable metabolites 
across the six featured networks. In addition to these similarities, a number of 
substantial differences are readily observed through the plots in Figure [6] Perhaps 
the most striking of which, is the large number of non-returnable metabolites (24 
of a possible 68) displayed by the obligate bacterium Buchnera. Overall, we see 
that despite their differences, dramatic changes in the profiles of these common 
metabolites are not present. This can be understood due to the existence of a 
certain core of common, or universal, metabolites that have evolved in a 'stan- 
dard way' across the different bacteria. Similarity between returnability scores 
across this common core can be easily calculated using the Pearson correlation 
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Figure 6. Local returnability profile for networks from each of 
the six bacterial lifestyles. Values of K r (0.25,i) are given for those 
metabolites that occur in each of the networks (68 in total): B. 
aphidicola (buc); T. tengcongensis (tte); C. crescentus (ccr); E. 
coli (eco); L. lactis (11a) and B. subtilis (bsu). 

coefficient, and we have observed (data not shown) that these correlations are 
always greater than 0.87, and in some cases as high as 0.98, clearly supporting 
our hypothesis that variability in returnability across these core metabolites is 
significantly reduced. Thus, the main differences in local and global variability 
can be attributed to a smaller set of metabolites that are more or less specific to 
the different kinds of bacteria (according to their environments). Such variations 
in the local returnability are well described by the values of K r (/3, i) which can be 
further used as an indicator of adaptability and evolution for specific metabolites. 

4. Conclusions 

In this paper, we considered the difference in free energies, between a chemical 
reaction network and its 'hypothetical' equilibrium, to be a proxy for the extent 
to which a metabolic system is in disequilibrium. In particular, we found that 
on average 'global reversibility' increases significantly in line with environmen- 
tal variability, supporting the view that organism adaptability leads to increased 
complexities in the resultant metabolic networks. In addition, we have proposed a 
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new, directed centrality measure for characterising nodes (metabolites) in terms 
of the returnable pathways they participate. For the 116 metabolic networks 
studied here, the new measure does not show strong correlations with other di- 
rected centrality measures, providing a distinctly different ranking of metabolites. 
Moreover, we used the aforementioned local returnability metric to analyse the 
role of individual metabolites for illustrative networks across the six different 
lifestyles studied. We found that common metabolites across these networks 
were extremely highly correlated, indicating that differences displayed in local 
and global returnability may be attributed to a small set of environment specific 
metabolites. 
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